-
Notifications
You must be signed in to change notification settings - Fork 537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GAX client_config to service factory #848
Conversation
@timanovsky This isn't Datastore, but this is the approach we are considering for configuring GRPC/GAX clients. Can you look at this and let us know what you think of this? Is the ability to configure the client at this level worth the risk of future breaking changes due to how leaky this approach is? |
@jmuk Can you weigh in on this? How comfortable are you with gcloud-ruby leaking the GAX configuration this way? How likely are we to see breaking changes to the configuration? |
In my feeling, the format is almost stable. It's unlikely to change, and I'd say it's safe to expose to the users. |
@jmuk Are the code examples in the PR description accurate? |
Yes, the format in the commit message is correct. |
@bmclean Any input on this? |
Coverage increased (+0.001%) to 95.656% when pulling 378c09e00c694589025ff1aebc32e481b23158d0 on blowmage:language-client_config into 956fbba on GoogleCloudPlatform:master. |
@blowmage I like the approach of defining which error codes get retried. We currently automatically retry at the application level for I don't know if we would make use of the fine-grained control proposed in #849. For datastore calls critical to our business I think we would want to keep the retry at the application level. |
@bmclean: Thanks for the input! |
@timanovsky Can you look this over and give us your thoughts on exposing this level of configuration? This would replace the |
I don't see how I specify number of retries. I see time deadline, but not number of attempts desired. Also, can you provide sensible default list of exceptions to retry on? My understanding is that it is
According to https://cloud.google.com/datastore/docs/concepts/errors#error_codes (not sure about the last one) |
Perhaps @jmuk can give us some more explanation of the configuration and what the error strings are that can be set for retry.
Right, that's the change with this new configuration. There is no one value for number of retries. The number of retries would have to be calculated from the following parameters: "retry_params" => {
"default" => {
"initial_retry_delay_millis" => 100,
"retry_delay_multiplier" => 1.3,
"max_retry_delay_millis" => 60000,
"initial_rpc_timeout_millis" => 60000,
"rpc_timeout_multiplier" => 1.0,
"max_rpc_timeout_millis" => 60000,
"total_timeout_millis" => 600000
}
} |
@timanovsky For reference, here is the entire GAX JSON configuration for Datastore, since you are likely most familiar with it: {
"interfaces": {
"google.datastore.v1.Datastore": {
"retry_codes": {
"retry_codes_def": {
"idempotent": [
"DEADLINE_EXCEEDED",
"UNAVAILABLE"
],
"non_idempotent": []
}
},
"retry_params": {
"default": {
"initial_retry_delay_millis": 100,
"retry_delay_multiplier": 1.3,
"max_retry_delay_millis": 60000,
"initial_rpc_timeout_millis": 60000,
"rpc_timeout_multiplier": 1.0,
"max_rpc_timeout_millis": 60000,
"total_timeout_millis": 600000
}
},
"methods": {
"Lookup": {
"timeout_millis": 60000,
"retry_codes_name": "idempotent",
"retry_params_name": "default"
},
"RunQuery": {
"timeout_millis": 60000,
"retry_codes_name": "idempotent",
"retry_params_name": "default"
},
"BeginTransaction": {
"timeout_millis": 60000,
"retry_codes_name": "non_idempotent",
"retry_params_name": "default"
},
"Commit": {
"timeout_millis": 60000,
"retry_codes_name": "non_idempotent",
"retry_params_name": "default"
},
"Rollback": {
"timeout_millis": 60000,
"retry_codes_name": "non_idempotent",
"retry_params_name": "default"
},
"AllocateIds": {
"timeout_millis": 60000,
"retry_codes_name": "non_idempotent",
"retry_params_name": "default"
}
}
}
}
} |
@timanovsky The way I read the configuration, the time between retries will increment until the max is reached, and then it will keep retrying at that max interval. But there doesn't seem to be any mechanism for total number of retries. @jmuk can you confirm? |
I'm not certain about this, but |
This quite uncommon approach. If it will remain so, I guess I will need to keep using application level retry logic. |
@jmuk Regarding @timanovsky's comment above, is there any flexibility in the design of the GAX retry behavior? |
The interval between retries are increased exponentially (by Other errors are normally not retried automatically, developers will have to look into the details to fix the problem (and so they are recommended as "Do not retry without fixing the problem"). ABORTED can be added for some methods too, although I'm not sure it's actually used for non-transactional APIs. |
The config has 'retry_codes_def' section where which specified retryable errors, but the retrying number is same for a method, per-failure retry logic is not implemented (and that's complicated when it's combined with exponential backoff). |
Thanks @jmuk. Is there a better list of all the GRPC Error Codes than this? |
How does |
@jmuk Any input? |
@@ -110,7 +110,7 @@ def self.language project = nil, keyfile = nil, scope: nil, retries: nil, | |||
end | |||
Google::Cloud::Language::Project.new( | |||
Google::Cloud::Language::Service.new( | |||
project, credentials, retries: retries, timeout: timeout)) | |||
project, credentials, timeout: timeout, client_config: client_config)) |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
Sorry for the delay. Yes, that link will be the reference for gRPC errors, but in the context of Google API clients, https://cloud.google.com/datastore/docs/concepts/errors#error_codes would be the better reference.
If it's set as 5 seconds and the server does not return in that time, |
Thanks @jmuk, that is perfect. I still think this approach has more value than using a fixed retry parameter. It gives you exponential back-off on the desired error codes (I am thinking of
@blowmage I am now thinking that we would also have use cases for the per API request approach outlined in #849. |
Replace retries value with the GAX client configuration. This leaks the GAX implementation to the google-cloud public API. If Gax changes its implementation this may result in a change to the google-cloud public API.
378c09e
to
d448dbf
Compare
After considering the options, I agree that this is probably the best way forward. I am going to merge this PR, and I think we should use it as that basis for configuring retries in other services that we convert to GAX. |
GAX does not directly support integer. [refs googleapis#848]
Language is the first service using GAX. GAX implements retries and incremental backoff, and our original plan was to make it conform to the
retries
value used by other services. However, GAX is different enough that it probably isn't feasible to share the same configuration approach as Google API Client. Therefore, here is an option for allowing users to configure the GAX client while continuing to use Google Cloud. See also #849 for overriding the GAX configuration on API calls. Either approach is independent of the other.This change replaces
retries
withclient_config
; it effectively leaks the GAX implementation to the Google Cloud Language public API. This means that if GAX ever changes its implementation Google Cloud Language may need to also make a breaking change to support the new GAX API. So far we have resisted leaking these types of details, but there is no better way to allow the GAX client to be fully configurable other than leaking these details.What do you think of this type of change? Is this a direction you want to see the other GRPC/GAX services move towards?
The
client_config
hash can contain any part of the config JSON that is intended to be overridden. Here is what this may look like: